Genetic risk scores and dementia risk across different ethnic groups in UK Biobank

Background Genetic Risk Scores (GRS) for predicting dementia risk have mostly been used in people of European ancestry with limited testing in other ancestry groups. Methods We conducted a logistic regression with all-cause dementia as the outcome and z-standardised GRS as the exposure across diverse ethnic groups. Findings There was variation in frequency of APOE alleles across ethnic groups. Per standard deviation (SD) increase in z-GRS including APOE, the odds ratio (OR) for dementia was 1.73 (95%CI 1.69–1.77). Z-GRS excluding APOE also increased dementia risk (OR 1.21 per SD increase, 95% CI 1.18–1.24) and there was no evidence that ethnicity modified this association. Prediction of secondary outcomes was less robust in those not of European ancestry when APOE was excluded from the GRS. Interpretation z-GRS derived from studies in people of European ancestry can be used to quantify genetic risk in people from more diverse ancestry groups. Urgent work is needed to include people from diverse ancestries in future genetic risk studies to make this field more inclusive.

• First 32 principal components (PCs) calculated for the non-European dataset (computed with GENESIS package) • Eigenvalues of the provided PCs (computed with GENESIS package) • List of the up to 3 rd -degree relatives in the non-European dataset and the corresponding kinship values (computed with GENESIS package) • Summary statistic files (MAF, INFO information, computed with SNPTEST) The participant IDs in all the provided files come from project 9922.
We have also identified samples belonging to the following ancestry groups based on genetic data: East Asian N=2464, South Asian N=8964, Black N=9233, Admixed with predominantly European origin (N=11251). We can share this information upon request.

Non-European dataset definition
38,598 participants were considered in the non-European UK Biobank dataset after exclusion of gender mismatches, missingness/heterozygosity outliers, participants with excessive genetic relatedness, no QC metrics, individuals that have withdrawn their consent (based on sample-QC information provided by UK Biobank team) and European participants (samples with UK Biobank provided PC1<0 and PC2 > -10, green lines in Figure 1). The initial selection of European and non-European samples was done by Dr. Alina Farmaki, as described in Figure 1. Samples with PC1< 0 and PC2> -10 (as provided by UK Biobank team) were defined as European. The corresponding values for White British were PC1< -6.05 and PC2 > -2.02). The red colour depicts the EUR (~450,000) as reflected by the k-means 7 and the pink colour depicts the EUR (~456,000) as reflected by the k-means 6.

Population Structure and Relatedness Inference using the GENESIS package (Matthew P.Conomos, 2019-02-20, R version 3.5)
GENESIS provides statistical methodology for analysing genetic data from samples with population structure and/or familial relatedness.
This analysis was based on a subset of 171,258 SNPs of the non-European dataset (implying the same thresholds with UK Biobank PCA analysis, suppl. file of Bycroft et al., including missing rate > 0.015, MAF <0.01 , markers in regions of long-range LD as provided by UK Biobank team and pruning to a set of independent markers such that pairwise r2 < 0.1, using windows of 1000 markers and a step-size of 80 markers).
Two rounds of principal components and relatedness calculation were performed in this project: • Principal component analysis was performed with PC-AiR algorithm that accounts for known or cryptic relatedness, to get PCs that capture population and not family structure ( Figure 2) (Conomos, Miller, & Thornton, 2015). The first 32 PCs and corresponding eigenvalues are provided. • Relatedness was estimated again using the more precise PCs using the PC-Relate method (Conomos, Reiner, Weir, & Thornton, 2016). The list of up to 3rd-degree relatives based on the relatedness estimation adjusted for PCs and the corresponding kinship values is provided. There was a significant reclassification of individuals based on their relatedness compared to UK Biobank provided metrics (Table 1).  Figure 2. First 4 principal components in PCA on non-European UK Biobank dataset (PC-AiR), coloured according to their self-report ethnic background.

Summary statistics (computed with SNPTEST)
• Non-Europeans (Minor Allele Frequency, INFO score): two analyses were performed; one with the whole dataset and one after excluding second degree relatives (based on kinship information provided by UK Biobank) • Europeans (Minor Allele Frequency, INFO score, Hardy-Weinberg Equilibrium): two analyses were performed for the related and the unrelated subsets as above